review spam
A Unified Model for Unsupervised Opinion Spamming Detection Incorporating Text Generality
Xu, Yinqing (The Chinese University of Hong Kong) | Shi, Bei (The Chinese University of Hong Kong) | Tian, Wentao (The Chinese University of Hong Kong) | Lam, Wai (The Chinese University of Hong Kong)
Unlike other forms of spamming, it is difficult to collect a large amount of gold-standard labels for reviews Many existing methods on review spam detection by means of manual effort. Thus, most of these methods considering text content merely utilize simple text [Mukherjee et al., 2013; Li et al., 2013a; Sun et al., features such as content similarity. We explore a 2013] just rely on the ad-hoc or pseudo fake or non-fake novel idea of exploiting text generality for improving labels for model training, such as the labels annotated by spam detection. Besides, apart from the task the Amazon anonymous online workers [Ott et al., 2011; of review spam detection, although there have also Li et al., 2014]. On the other hand, some unsupervised been some works on identifying the review spammers methods have been proposed to detect the individual review (users) and the manipulated offerings (items), spammer [Mukherjee et al., 2013; Lim et al., 2010; no previous works have attempted to solve these Wang et al., 2011] and review spammer groups [Mukherjee et three tasks in a unified model. We have proposed al., 2012]. In addition, time series pattern [Xie et al., 2012], a unified probabilistic graphical model to detect rating distribution [Feng et al., 2012], reviewer graph [Wang et the suspicious review spams, the review spammers al., 2011], and reviewing burstiness [Fei et al., 2013] have also and the manipulated offerings in an unsupervised been applied to identify the review spams in an unsupervised manner.
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- (4 more...)
Learning to Identify Review Spam
Li, Fangtao Huang (Tsinghua University) | Huang, Minlie (Tsinghua University) | Yang, Yi (Tsinghua University) | Zhu, Xiaoyan (Tsinghua University)
In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > Canada (0.04)
- (2 more...)